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ABSTRACT 


Over the past few decades e-commerce has increased manifolds. The e-commerce websites ask their 
customers tO share their views abOut the prOducts they have purchased. Therefore, milliOns Of reviews are 
accumulated for the products in e-commerce websites. Customers view the reviews of the product before 
they purchase the product. If a product has more positive reviews then that will result in more customers 
buying the product. So, classifying the vast collections of reviews into different categories is the need of the 
hour. This paper describes one of such mechanisms for classifying reviews using text mining and extracting 
the sentiment of the review. The proposed mechanism involves extracting product reviews from e-commerce 
websites, identifying the terms that represent the sentiment of products, highlighting the positive terms that 
are more frequent and specifying the frequencies of different sentiment defining terms in the product reviews. 
Keywords: - Classification, Amazon Reviews, Sentiment Analysis, Wordnet, Positive Word Cloud 


1. INTRODUCTION 


Covid-19 has changed the way customers 
purchase products. Earlier very few people used 


to purchase products using online sales 
applications. People are choosing online 
platforms to purchase products as_ social 


distancing is the need of the hour. These online 
sales applications or e-commerce websites request 
the customers to give their opinions about the 
products they have purchased, the services of the 
online sales site etc. These customers’ reviews 
will give an overview for the new buyers about the 
product and the services offered. In this way a 
huge collection of customer reviews are 
accumulated in every online sales application for 
each product. Before purchasing a product using 
online sales platform, customers wish to see the 
product reviews. 


As the collection of reviews for a product 
increase, there will be different opinions for same 
product by different customers. This leads to 
ambiguity for customer. There will be some 
positive reviews as well as negative reviews for 
the same product from different customers. So, 
there is a requirement to analyse these reviews and 


provide overall customer opinion for a given 
product. 


To perform sentiment analysis and classification, 
subjective information needs to be extracted from 
given review text in natural language. This 
subjective information can be opinions and 
sentiments. To extract opinions or sentiments 
from given customer review text, various 
approaches like natural language prOcessing, text 
analysis, COmputatiOnal linguistics and bi0metrics 
are available. Different Machine learning 
methods can be used for this semantic and review 
analysis as they are very efficient and simple to 
implement. 


Various e-commerce sites like Amazon, Flipkart, 
Snapdeal etc. are available for online purchasing 
for customers. AmazOn is One Of the e-commerce 
giants where thousands of purchases are 
performed every day. Customers give opinions 
about the product like its properties, quality, 
appearance, and recommendations. These 
recommendations will give an insight into almost 
all the features of the product for the new buyers. 
These are not only helpful for consumers but also 
for sellers to improve their _ services. 
Manufacturers of products can also understand 
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the requirements of the customers better and can 
make modifications to their products accordingly. 


This paper discusses about the extraction of 
opinions or sentiments from the customer reviews 
and analysing them using machine learning 
algorithms. 


Sentiment analysis techniques can be broadly 
categorized as (1) Lexicon-based, (11) Machine 
learning based and (i11) Deep learning based. 


In lexicon-based sentiment analysis, sentiment 
scores are assigned to words based on predefined 
sentiment lexicons or dictionaries. The overall 
sentiment of a document is then calculated based 
on the sentiment scores of its constituent words. 


In machine-learning based sentiment analysis, 
supervised or unsupervised machine learning 
algorithms are used to train models on labelled 
data (where sentiments are annotated) or to cluster 
texts based on their sentiment patterns. 


In deep-learning based sentiment analysis, deep 
learning architectures like Recurrent Neural 
Networks (RNNs), Convolutional Neural 
Networks (CNNs), or Transformer models are 
applied to learn complex patterns in text data for 
sentiment analysis. 


2. BACKGROUND 


e-commerce has become very popular as it will 
allow the customers to leave reviews on different 
products. It is difficult for manufacturers to keep 
track of the opinions of the customers as they 
leave a voluminous collection for every product. 
So, it 1s impOrtant to process such large and 
cOmplex data in Order t0 derive useful informati0n 
frOm it. To tackle this problem classification 
methods which are part of machine learning can 
be used. Classification is one of data mining 
techniques which divides the data into different 
categories based on their characteristics (Pandey 
et al.2016; Rain 2013). Organizations want to 
automate this classification process while 
extracting data from large data sets(Liu et. al 
2014). 


Opinion mining or sentiment analysis is a natural 
language pr0cessing (NLP) pr0blem which 
detects and mines subjective infOrmation Of text 
sOurces. The main aim Of sentiment analysis is t0 
analyse the customer reviews written and classify 
them as pOsitive Or negative. When classification 
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is done as positive review or negative review then 
there is no need for the system to understand each 
phrase or document (Liu 2015; Pang et. al 2002; 
Tumey & Littman 2003). Labelling the words as 
positive or negative is not sufficient. This process 
involves some challenges. FOr example, the wOrd 
“excellent” has a pOsitive pOlarity. But if this 
word is preceded by another word “not” then it 
gives opposite polarity like negative polarity 
(Singla et. al 2013). So, classification of words or 
phrases with prior polarity has some drawbacks. 


Sentiment analysis is carried out in various fields 
like movie reviews, product reviews and travel 
reviews etc. (Liu et. al 2013; Pang et al. 2002; Ye 
et al. 2009). Methods based on lexical analysis 
and machine learning are most widely used 
methods fOr sentiment classificatiOn. 


Sentiment ClassificatiOn using Machine 
Learning Methods: 

An algorithm is developed by machine learning 
methods to improve the performance of the 
system by learning from example data. There are 
two steps in the solution provided by machine 
learning for sentiment analysis. They are (1) learn 
or train the model from training data and (11) 
classify unseen data using trained model 
(Khairnar & Kimikar 2013). Machine learning 
algorithms can be divided into different categories 
like 


(a) Supervised learning 
(b) Semi-supervised learning 
(c) Unsupervised learning 


In Supervised Learning, a model is built using 
training data. It is similar to a teacher supervising 
the learning the prOcess of its students (Brownlee 
2016). In trains the model to come up with some 
kind of output. The training data contains the class 
label. The class labels of output are also known. If 
more labelled data is provided as input, then 
output will be more precise. If output is deviating 
from expected result, then the model is built again 
using more labelled data. One of the limitations of 
supervised learning are it gives wrong result or 
unknown result if input data is not labelled. In 
unsupervised learning the input data is unlabelled 
with no corresponding output. The algorithm 
discovers similar patterns from data and 
provides output by grouping similar items into 
one category. Clustering is one of the 
unsupervised learning methods, which works 
efficiently. Clustering identifies similar grOups Of 
data in the data set (Kaushik 2016). 
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In semi-supervised learning, the benefits Of bOth 
supervised and unsupervised learning are 
incorporated. Data sets that contain only some 
labelled data and remaining unlabelled data are 
generally applied with semi-supervised learning 
models. This method is generally used where data 
collection is cheaper and data labelling is costlier. 


Yogeesh and  Antoreep(2020) performed 
sentiment analysis on Twitter data using machine 
learning and deep learning models. 


R.R. Kalangi et. al (2021) proposed machine 
learning based techniques for sentiment analysis 
of Airlines reviews. 


S.Sindhu and S. Kumar(2023) presented survey 
of different machine learning and deep learning 
techniques used in sentiment analysis. 


3.SYSTEM DESIGN AND 
IMPLEMENTATION 


The process involved in sentiment mining of 
product reviews can be given as: 


1. Data collection:- Gather product reviews 


from various sources such as e- 
commerce websites, social media 
platforms, forums or dedicated review 
websites. 


2. Preprocessing:- Clean the text data by 
removing irrelevant information such as 


Convert to 
UTF format 


Extract 
Reviews 


Build positive 
word cloud 


Build negative 


word cloud 


Remove 
Punctuation 


marks 


Positive 
words 


Negative 
Words 


E-ISSN: 1817-3195 
HTML tags, punctuation and stop words. 
Tokenization, Stemming can also be 


applied to normalize text. 


3. Sentiment Analysis Techniques: Apply 
one of sentiment analysis techniques. 


4. Sentiment Classification: Classify each 


review into’ predefined sentiment 
categories such as positive, negative, or 
neutral. 


5. Evaluation: Assess the performance of 
the sentiment analysis model using 
metrics such as accuracy, precision, 
recall, Fl-score, or confusion matrix. 


6. Post processing and _ visualization: 
Analyze the results, identify trends, and 
visualize the sentiment distribution using 
techniques like word clouds, bar charts, 
or sentiment heatmaps. 


The following figure gives the basic workflow 
followed for opinion mining of customer reviews 
in e-commerce websites. Customer reviews from 
various e-commerce sites like amazon, flipkart 
etc. can be extracted using the API provided in 
their sites. Fig.l shows the block diagram of 
system for sentiment mining. 


Remove .csv file 
stopwords 
Build ae: 
Corpus Corpus 


Fig 1. Block Diagram for Sentiment Mining 
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Input:- URL of customer reviews of the e-commerce website 
Output:- Summary of Sentiments extracted from customer reviews. 
Steps 
Crawl the ecommerce website url for the given product to extract the customer reviews. The 
customer reviews’ text and rating for the product are extracted and stored in a frame. 
Convert extracted text to UTF format and remove content present in other formats. 
Remove punctuation marks and white spaces from extracted text. 
Remove English stop words like a, an, the, this, that etc. 
Build term document matrix. 
Build a corpus of all the words in the reviews. 
Compare the corpus with positive word list and identify all positive words of the product reviews. 
Compare the corpus with negative word list and identify all negative words of the product reviews. 
Build positive word cloud and negative word cloud. 
0. A summary of sentiments is generated as a chart. 


Fig 2. Algorithm for sentiment mining 


4. RESULTS ”HP DeskJet Color Printer”. First 5000 customer 
reviews are extracted and sentiment mining is 
As part of Implementation, amazon website is performed. The algorithm for sentiment mining is 
used for product “HP DeskJet Color Printer”. The shown in fig.2. The algorithm is implemented 
amazon website customer reviews page is using code written in R programming Language. 


accessed for extracting the reviews for product 
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Fig 3: Positive Word Cloud For The Reviews Of HP Deskjet Color Printer 
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Fig 4: Negative word cloud for the reviews of HP DeskJet Color Printer 


Positive and negative word clouds are built from 
the customer reviews by removing stop words. 
The positive and negative word clouds for the 
given product are depicted in fig.3 and fig.4. The 
words that occur more frequently appear 
highlighted in the clouds and words that are not 
repeated much appear in small size. 

The frequencies of the various positive words and 
negative words are depicted in fig.5 and fig.6. It 
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is observed that frequency of positive words is in 
range of 500-1000. The frequency of negative 
reviews is in range of 0-200. Considering the 
overall frequencies of all sentiment words, 
positive words are more. So, the customer can see 
that most of the reviews are positive and the most 
referred positive features of the product can be 
presented to customer. The customer can identify 
overall sentiment of the product reviews. 
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Fig 6: Frequency of Negative Words in Reviews 
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5.CONCLUSION 


E-commerce sites have become very popular 
during this covid-19 pandemic era. Customers 
purchase products using e-commerce websites 
more. Customers wish to see feedback or reviews 
from previous customers of the product before 
making a purchase. The sentiment analysis system 
developed here helps customers to come to a 
conclusion about the customer reviews. The 
system developed here helps customers in taking 
decision before purchasing a _ product. E- 
commerce websites are faced with fake customer 
reviews. So, the proposed system can be extended 
in future to consider only real customer reviews 
and ignoring fake customer reviews while 
identifying overall sentiment of a given product. 
The current sentiment analysis techniques are 
unable to identify sarcasm present in customer 
reviews. Future research can focus on this area. 
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